Deep learning neural network image analysis of immunohistochemical protein expression reveals a significantly reduced expression of biglycan in breast cancer

New breast cancer biomarkers have been sought for better tumor characterization and treatment. Among these putative markers, there is Biglycan (BGN). BGN is a class I small leucine-rich proteoglycan family of proteins characterized by a protein core with leucine-rich repeats. The objective of this study is to compare the protein expression of BGN in breast tissue with and without cancer, using immunohistochemical technique associated with digital histological score (D-HScore) and supervised deep learning neural networks (SDLNN). In this case-control study, 24 formalin–fixed, paraffin-embedded tissues were obtained for analysis. Normal (n = 9) and cancerous (n = 15) tissue sections were analyzed by immunohistochemistry using BGN monoclonal antibody (M01-Abnova) and 3,3’-Diaminobenzidine (DAB) as the chromogen. Photomicrographs of the slides were analysed with D-HScore, using arbitrary DAB units. Another set (n = 129) with higher magnification without ROI selection, was submitted to the inceptionV3 deep neural network image embedding recognition model. Next, supervised neural network analysis, using stratified 20 fold cross validation, with 200 hidden layers, ReLu activation, and regularization at α = 0.0001 were applied for SDLNN. The sample size was calculated for a minimum of 7 cases and 7 controls, having a power = 90%, an α error = 5%, and a standard deviation of 20, to identify a decrease from the average of 40 DAB units (control) to 4 DAB units in cancer. BGN expression in DAB units [median (range)] was 6.2 (0.8 to 12.4) and 27.31 (5.3 to 81.7) in cancer and normal breast tissue, respectively, using D-HScore (p = 0.0017, Mann-Whitney test). SDLNN classification accuracy was 85.3% (110 out of 129; 95%CI = 78.1% to 90.3%). BGN protein expression is reduced in breast cancer tissue, compared to normal tissue.


Introduction
In women, breast cancer is the most commonly diagnosed cancer in the world and the leading cause of death from cancer [1]. Breast cancers are heterogeneous in nature, both at the histological and molecular levels, and the molecular profiling of breast cancer guides diagnostic and therapeutic strategies for the disease [2]. New breast cancer biomarkers have been sought in order to better characterize tumors and to select the best possible treatment [3]. Small leucinerich proteoglycans (SLRPs), a diverse sub-group of proteoglycans, are involved in matrix organization and the regulation of cell growth and signaling [4]. Biglycan (BGN) is one SLRP whose gene has been mapped to the Xq28 chromosome [5]. BGN is a class I SLRP characterized by a protein core with leucine-rich repeats and is composed of 331 amino acids and a molecular weight of 42 kDa [6]. The molecular weight of BGN increases up to 100-250 kDa, when fully glycosylated. This glycosylation is due to the presence of two chondroitin/dermatan sulfate and glycosaminoglycan chains covalently attached to the N-terminal region [7]. The glycosaminoglycan chains consist of repeating disaccharide units of either chondroitin sulfate or dermatan sulfate and are attached to the core protein via an O-linked glycosidic bond. This proteoglycan is ubiquitously expressed; it can be incorporated into the extracellular matrix (ECM) or exist in the blood in its soluble form under certain disease conditions [8,9].
BGN could alter tumor proliferation by modulating the receptors and cellular expression molecules within the tumor microenvironment [9]. Zhao et al., using a cancer microarray database and a web-based data-mining platform (Oncomine), have reported that BGN gene expression was upregulated in breast and other cancers [10]. However, the clinical impact of BGN on cancer is still poorly understood and sometimes contradictory. For instance, in bladder cancer, silencing of BGN resulted in enhanced tumor cell proliferation, indicating that BGN acts as a growth suppressor in this disease [11], while another study using animal model found that the inhibition of stromal BGN promoted normalization of the tumor microenvironment and enhanced chemotherapeutic efficacy in mice injected with breast cancer cells [12]. Bischof et al. demonstrated that the injection of normal, early stage, embryonic mesenchyme cells was sufficient to induce differentiation and suppress growth of mouse mammary tumor epithelial cells both in vitro and in vivo; they reported that BGN was required for tumor normalization [13].
These apparent contradictions may be explained by the different types of models used, i.e., in vitro, animal model, and detection method (mRNA, or immunohistochemistry). The type of antibodies used in immunohistochemistry and the specific site for BGN identification may also explain some of these discrepancies. For instance, mature and functionally active BGN protein was detected using a polyclonal antibody after glycosaminoglycan removal by enzymatic digestion with chondroitinase ABC [14]. This selective removal was performed, because the presence of a complex of two chondroitin/dermatan sulfates with glycosaminoglycan chains could hinder antibody binding, leading to the misinterpretation of results [14]. Nevertheless, this polyclonal antibody was discontinued, making new studies with this antibody unsuitable. These differences are more evident when comparisons between different antibodies are applied in the same tissue. For instance, the Human Protein Atlas database [15] has two antibodies validated against BGN to be used in immunohistochemistry: HPA003157 (Sigma, Aldrich), a polyclonal antibody that targets 140 amino acids, and H00000633-M01 (Abnova, Taipei, Taiwan), and a monoclonal antibody against the full sequence of BGN (368 amino acids); the protein expression of BGN in breast cancer is different between these two antibodies. While the former had 8.3% negative expression (1 out 12 cases), the latter had a negative expression in 75% (9 out 12 cases) [16]. In addition, manual evaluation of immunohistochemically stained specimens is a subjective and highly individual task, which, as has been reported by others, depends on intra-and inter-observer variability [17].
To reduce the highly subjective and easily biased nature of these tasks, the digital histological scoring method (D-HSCORE) has been reported [18]. Another emerging area in image analysis is deep learning. Deep learning (DL) is a form of machine learning that relies on both supervised and unsupervised learning; DL applied to digital pathology uses artificial neural networks (ANN) to determine if the output or interpretation of a digital image is correct [19]. ANN uses multiple layers of calculations imitating the complex network of neurons in the human brain to analyze this complex data [19].
Data on BGN expression in human breast cancer is scant and contradictory [13,20], and little information is available for human BGN protein expression in vivo using an antibody against the full length of the protein. Therefore, the objective of this study is to verify the immunohistochemical expression of BGN in breast cancer biopsies compared to normal breast tissue using a validated monoclonal antibody and two digital imaging methods of analysis: D-HSCORE and deep learning neural network image analysis.

Ethics statement
This study was submitted and approved by Hospital de Clínicas de Porto Alegre Ethical Review Board, under the approval number 2019/0337 and registered at Plataforma Brasil under the certificate of submission for ethical analysis (CAAE 15329119.9.0000.5327).

Study design and setting
In this case-control study, paraffin blocks were obtained from the pathological archive of Hospital de Clínicas in Porto Alegre, Brazil. Slides were dated between January 1st, 2012, and December 30th, 2015. The original pathological report was reviewed by a certified board pathologist to confirm the diagnosis of benign and cancerous breast tissue. The study was conducted between May 20, 2019, and July 30, 2020.

Patients and methods
Women with diagnoses of invasive ductal carcinoma and those who underwent breast surgery for benign conditions (e.g., mammoplasty, benign mammary cyst) were included in the sample. Patients with lobular carcinoma or intraductal papilloma who had undergone chemotherapy or radiotherapy and aged below 20-and over 79-years-old were excluded. These cases were excluded since chemo and radiotherapy may change protein expression of the tumor.

Variables
BGN protein expression was the primary continuous variable, i.e., DAB units, varying between 0 and 255 units.
Healthy breast tissue (benign-control group) and breast cancer tissue (cancer) were categorical data. Other variables were age and ethnicity. Estrogen and progesterone receptor status were described in cancerous tissues, along with human epidermal growth factor receptor 2 (HER2), Nottingham Grading and tumor staging.

Data sources / measurement
Immunohistochemistry. Immunohistochemistry methodology was performed according to standard technique [21] and as previously reported by our group with minor modifications [22]. Modifications included deparaffinized at 75˚C for two hours, followed by xylol rinse, rehydrated in successive steps of ethanol, water, and phosphate-buffered saline solution (PBS); slide incubation was done in sodium citrate solution, pH 6 at 90˚C for 45 min. for antigen retrieval; primary antibody against the full length of recombinant BGN was BGN monoclonal antibody (M01), clone 4E1-1G7, IgG2a kappa (Abnova, Taipei, Taiwan). It was used at dilution 1:1000 at pH 6 and incubated overnight. After 2 x 5 minutes in a PBS rinse, secondary antibody anti-mouse IgG (whole molecule), namely peroxidase antibody produced in rabbit (A9044, Sigma-Aldrich, Darmstadt, Germany), was incubated for 90 minutes at 22˚C in the same chamber using a dilution of 1:1000. The primary antibody visualization and counterstained were performed as previously reported [22]. Negative controls were obtained by replacing the primary antibody with mouse IgG2a, kappa monoclonal [18C8BC7AD10]-Isotype Control (ab170191)-Abcam, Cambridge, UK). Human lung cancer samples were used as external positive control. These procedures followed the REporting recommendations for tumor MARKer prognostic studies (REMARK) guidelines [23].
Images from stained sections were obtained using an optical microscope (Olympus BX51 microscope; Olympus Optical Co., Tokyo, Japan) with a 40x objective U Plan Fluorite dry objective (numerical aperture 0.65 mm, Olympus). A digital color camera (Olympus DP73; OM Digital Solutions Co., Tokyo, Japan) captured digital images, at a size of 4800 x 3600 pixels (resolution: 1 mm = 6000 pixels), under standard conditions for ImageJ analysis. Another set of images (n = 129 photomicrographs) with a 100x objective (UPLFL 100x; Oil Immersion, Olympus) were taken of entire slides for supervised deep learning neural network analysis.

Image analysis with ImageJ
Photomicrographs were coded and blindly analyzed using Digital HSCORE (D-HSCORE) as previously reported [18,24,25]. Briefly, only the glandular and tumor sites of the tissue sections were selected as regions of interest (ROI). After selecting the ROI, images were submitted for "color deconvolution" analysis. The image with DAB staining was used for analysis.

Supervised deep learning neural network
The 129 photomicrographs of DAB-only images, with 100x magnification and without ROI selection, were submitted to the inceptionV3 deep neural network image embedding recognition model using Orange 3.31.0 software (University of Ljubljana, Slovenia). Next, supervised neural network analysis (SDLNN), using stratified 20-fold cross validation with 200 hidden layers, ReLu activation, and regularization at α = 0.0001, were submitted to SDLNN, in Orange software.

Bias
Bias was reduced by using D-HSCORE and SDLNN.

Study size
The sample size for ImageJ analysis was calculated according to the literature [26] in order to have a power = 90%, an α error = 5%, and a standard deviation of 20, to identify a decrease from the average of 40 DAB units (control) to 4 DAB units in cancer. With these figures, at least 7 samples in each group were necessary.
Sample size for supervised neural network analysis was chosen for convenience after obtaining the maximum number of photomicrographs from the slides.

Quantitative variables
The average DAB units intensity, derived from up to three images obtained from color deconvolution, was calculated according to the formula: ƒ = 255-i, where ƒ = final DAB intensity, and i = mean DAB intensity obtained from the software, as previously described [18].

Statistical analysis
Groups, with categorical data, were compared using Fisher's exact test. Continuous data of BGN expression, in arbitrary DAB units, between groups, were compared using unpaired Student t-test with Welch's correction if data had a Gaussian distribution and different SDs, otherwise Mann-Whitney test was used. D'Agostino & Pearson omnibus normality test was used to verify Gaussian distribution. These analyses were performed using GraphPad Prism version 9.3.1 for Macintosh (GraphPad Software Inc. San Diego, CA).
Deep neural network image embedding recognition model analysis was performed using Orange 3.31.0 software (University of Ljubljana, Slovenia). Supervised neural network was set with 200 hidden layers, the rectified linear activation function (ReLu) was used for activation, Adam was used as the optimization algorithm [27], regularization was set at α = 0.0001, and

Participants and descriptive data
A total of 24 samples were obtained for the study: benign = 9; cancerous = 15. The mean age between groups was not significant (p = 0.1; unpaired Student t-test). Details of the sampled population are depicted in Table 1.

Supervised neural network analysis
A total of 129 high-power magnification (100x) photomicrographs with DAB staining only and derived from ImageJ (benign, n = 69; cancer, n = 60) were submitted for supervised neural network analysis (Fig 2).
The performance of the supervised neural network analysis yielded an area under the curve of 94.3%, further details are depicted in Table 2.

Discussion
BCG expression, using the antibody M01-4E1-1G7 under the methodological conditions described here, was mainly located in the cytoplasm and the extracellular matrix. The location of the protein is in accordance with others. It has been reported by different authors that BGN protein expression in gastric cancer was mainly located in the cytoplasm of epithelial cells [28,29].
In breast cancer tissues, the expression of BGN was significantly lower when compared to normal breast tissue. These results are in accordance with those published by Bischof et al. who reported that adding BGN to an in vitro model has the ability to reverse neoplastic progression and 'reboot' breast cancer cells [13]. Nevertheless, our results are different from those published by others [10,12]. Possible explanations for these discrepancies can be related to a) number of cases, b) type of model used, c) type of antibody, and d) its biological effect. Apparently, in the human protein atlas, only two cases were accessed [16], while our sample had nine cases. The standard deviation of the BGN protein expression in normal tissue was high: two cases had low levels of expression (5.39 and 8 DAB units), while one case reached 81 DAB units (Fig 1). The use of different species may also explain these differences. Cong et al. noted that suppressing stromal BGN may yield a potent and superior anticancer effect in breast cancer induced in BGN knockout mice, compared to wild type [12]. While Cong et al. used a knockout mice model, we used human breast cancer tissue. In addition, data obtained from Zhao et al. was based on an Oncomine database using mRNA [10]. Another explanation for the lower levels of BGN protein expression in breast cancer tissue may be related to the fulllength monoclonal antibody that was used here. Identifying various segments of the BGN protein may yield different outcomes due to the presence of chondroitin/dermatan sulfate + glycosaminoglycan side chains. The presence of these side chains may hinder antibody binding,

PLOS ONE
leading to the misinterpretation of results [14]. Finally, the amount of BGN in tissue sections does not necessarily reflect its biological effect as it mainly indicates BGN that has been sequestered in the extracellular matrix, for example, as part of the fibrotic scar [30]. Despite these

PLOS ONE
disagreements, the difference in the protein expression in benign and malignant breast tissue reveals that the local microenvironment, including the ECM, may have an important role in controlling cell growth, survival, and fate determination [20].
This study has some limitations. We neither analyzed subgroups, nor did any mechanistic experiment. In addition, we are unaware if there is a difference in the BGN expression across the menstrual cycle, which may explain the differences in BGN expression in normal breast tissue. Finally, we did not make a side-by-side comparison of different antibodies in order to identify putative differences.
The results here are strengthened by several aspects. The use of ImageJ software reduced the subjective bias of DAB quantification. The artificial intelligence was able to classify, with a high degree of accuracy, 129 photomicrographs based only on DAB expression, confirming the results obtained with ImageJ analysis. The use of both techniques together is not widely used yet, but it is promising. The criteria used for classification of the slides by the artificial intelligence was not completely understood; it is likely that the artificial intelligence used other factors beyond DAB expression. However, it is unlikely that the shape of the cells seen in DAB pictures had a major influence in the classification of the slides. The use of monoclonal antibody and non-specific primary antibody, as negative control, are evidence of the special care taken in the quality control of the immunohistochemical methodology. Immunohistochemical staining was performed using a non-specific primary antibody; this procedure has been considered a better methodology, compared to the omission of the primary antibody. Lung cancer was used as external positive and negative controls. External validity is expected once the same methodology is applied.
With our results, further studies may investigate the use of BGN as a biomarker or as a prognostic factor in breast cancer. The functional significance and the role of BGN alterations in breast tumorigenesis and progression remain to be determined.